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• • ;This is a note on logistic regression models and logistic kernel machine models. It contains derivations to some of 
the expressions in [I]. 

o '■ 

^ Logistic regression models and score tests 

for the ith individual (i = 1,- • • , n), let response yi be if unaffected, and 1 if affected. Let Xihe a qxl covariates 
^ vector (including an intercept term), be a p x 1 vector of SNP genotypes (or summary scores) for a given gene 
(SNP set) under testing, and Si be the environment covariate which is also included in Xi. We consider the logistic 
regression model with gene-environment interactions, 

Oh ' logit{pi) = Xf fi + a^z, + Si ■ Zi, i = !,■■■ ,n, (1) 

<^ ■ 

J ; where pi = Pr{yi = l\Xi, Zi). The goal is to test the null hypothesis Hq : ai = h = Q. Consider the score statistic, 

SS={{Y- ,Pfz\ {S-Y-S. /)^Z^) (^^^^_(^ : , (2) 

^ Where Y = {yi, • • • , yn)'^ , S = {si, ■ ■ ■ , Sn)'^ , = {zf , • • • , zj), and "•" stands for the element-wise multiplication. 
00 In addition, 

where p^'s are the fitted values of pi's under Hq. The information matrix 

= (/22 - /21/n /l2)~\ 

^ ivhere 

> : In = XD,X-, = iZD,X-, ZD,X-) = H, = ' f , 

^ and Z)i=diag(p^(l — p^)), D2= diag(sj • p?(l — p^))- Under Hq, SS ~ where v is the rank of the matrix 

■(Z, S ■ Z), where S ■ Z := {si ■ zf , ■ ■ ■ , s„ • 2;^)'^. 

Logistic kernel machine models 

Following [21 [3] , we now extend ([1]) to a semiparametric logistic regression model 

logit{pi) = Xj P + h{zi) + Si- g{zi), i = l,---,n, (3) 

where h[-) and g{-) belong to reproducing kernel Hilhert spaces YIk and Hj^ generated by kernels K(-, •) and 
K(-, •), respectively. Considering penalized likelihood, h{-) and g{-) can be estimated by 



■ 11 
^{yilog{-^) + log{l - Pi)) - t-||/i||hx " ~ 



(4) 



Following [2], the above solutions have the same form as the Penalized Quasi-Likelihood estimators from the 
logistic mixed model: 

logit{pi) = Xj (3 + hi + Si- gi, i = 1, • • • , n, (5) 



1 



where hi ^i,i.d. Nn{0, jK),gi ^i_i.d. -^n(0, and /i^'s and gts are independent. Denote r = 1/A and f = 1/A. 

Now, testing the null hypothesis of no genetic effects Hq : h{-) = g{-) = in ([3]) can be reformulated as testing 
the absence of the variance components Hq : r = f = in model ([5]) . As in [2l [3] , we consider the following test 
statistic based on the score statistic of (r, f): 



where the n x n matrices K := {K{zi, Zj)), K := {K{zi, Zj)), sK := {siSjKij). Under Hq, the mean and covariance 
of Q are, 

= (^'-^ = ( ^ '^^^^ = ( b^iPoKP^K) \tr{PQKPosk) \ 

^ Wy \\tr{PQsk))' ^ Wr 4) \\tr{PoskPoK) ^tr{PoskPosk) J 

where 

Po := Wo - WoX{X^WoX)-^X^Wo, 
and Wq = diag(j)^{l — p^))- We then linearly transform Q to make its two components uncorrelated: 

_ 1 

and fi* = Iq^ Since the components of Q* are quadratic forms, they can be approximated by scaled chiquare 
distributions KiX(j^.), respectively [2]. Through matching the means and variances, we have k* = 1/(2^*), 

fel = l/(2/ii) and v* = 2fi*^, u- = 2/i|^. Finally, we construct a combined test statistic 



.Q*r Q*f 

The corresponding p- value is then 



Q*max = max{^, -^) (7) 



p- value = 1 - (Q^ax-, K) ' (Qmax, '^f), 

where Fy^i', i^) is the cumulative distribution function of a chisquare distribution with degrees of freedom. Note, 
when both K and k are linear kernels, i.e., K{zi,Zj) = k{zi,Zj) = zfzj, models ([T]) and ([3]) have the same form. 
However, they are treated differently and consequently the corresponding test statistics are different. 
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